perm filename OUT.4[DIS,DBL] blob
sn#208266 filedate 1976-03-31 generic text, type C, neo UTF8
COMMENT ā VALID 00013 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00003 00002 Title: Automating the Discovery of Mathematical Concepts
C00009 00003 OUTLINE FOR DISSERTATION: Level 2
C00015 00004 OUTLINE FOR DISSERTATION: Level 3
C00016 00005 Chapter 1: Overview
C00020 00006 Chapter 2: An example
C00022 00007 Chapter 3: Overall Performance
C00024 00008 Chapter 4: Design of AM
C00028 00009 Chapter 5: A few more examples
C00029 00010 Chapter 6: Theoretical Issues in Designing a Math Theorizer
C00031 00011 Chapter 7: A Model for Creative Discovery in Science
C00048 00012 Chapter 8: Concluding Discussion
C00049 00013 Appendices
C00050 ENDMK
Cā;
Title: Automating the Discovery of Mathematical Concepts
Thesis Committee: B. Buchanan, E. Feigenbaum, C. Green, D. Knuth, A. Newell
****************************************************************************
OUTLINE FOR DISSERTATION: Level 1
Abstract
Dedication and opening quotation
Table of Contents
Chapter 1: Overview
Chapter 2: Historian's Summary, plus 1 Example
Chapter 3: Design of AM
Chapter 4: Capabilities and Limitations
Chapter 5: A few more examples; plus: experiments on AM
Chapter 6: General Issues in Designing a Math Theorizer
plus: A Model for Creative Discovery in Science
Chapter 7: Concluding Discussion
OUTLINE FOR DISSERTATION: Level 2
Abstract
<Should be one of the last things written>
a) A 1-sentence summary of the thesis
b) A 1-paragraph summary of the thesis
Chapter 1: Overview
a) 1-page summary of the project
b) Motivation: why is this worthwhile?
c) A 10-page summary of the entire project
d) Guide to reading the remainder of the thesis
Chapter 2: An example
a) Open with a crisp, 2-page example, a sample excerpt of a session with AM.
b) Explain the example in more and more detail,
but always in English/Math terms; even though it will be necessary to
delve into the system's reasoning deeper and deeper, that too should
always remain informal, in English/Math. Ignore represen., control mechanism,...
Chapter 3: Overall Performance
a) What was the starting knowledge, the concepts/heuristics AM initially had?
b) What did AM do? What new concepts were created, facets filled in, conjectures...
c) Explicit discussion of AM's capabilities and limits
What are some notable omissions in AM's behavior? Can the user elicit these?
Chapter 4: Design and Implementation of AM
<a unified, detailed description of the representations, control strucs,
data strucs, job list, rippling, etc.>
Chapter 5: A few more examples
<This chapter assumes that the reader has read through the detailed Design chapter>
<Go over the examples in varying levels of detail. Occasionally, give a
"snapshot" of the new jobs, concepts, facet entries, etc.
Chapter 6: General Issues in Designing a Math Theorizer
<Theoretical issues, constraints. Model of math research.
This is "one way things could be done...". It is neither the sole
"correct" solution, nor is every nuance captured in the AM system itself.
Chapter 7: A Model for Creative Discovery in Science
<Similar to above, but generally applicable to any simulatable empirical science.
View AM as providing a suggestive model for how to emulate a researcher.
This model is of course a (modified) heuristic search paradigm.
Chapter 8: Concluding Discussion
What gives AM it's power? Identifying the crucial ideas.
Ultimately, what kinds of things could AM-like systems (never) do?
Experiments on AM one can perform. Uses for AM-like systems.
Implications for math education. Directions for Future Research.
Appendix 1: The Theory of Maximally-Divisible Numbers
Recap their discovery, clarify the role of AM and the human users/provers.
The way such a minitheory would be presented in a journal article.
Appendix 2: History of the "BEINGs" representation scheme
Origins in PUP. Fruition and development in PUP6. Application here in AM.
Contrast to other repr. schemes. What does this "give you"?
Relationship between AM and automatic programming.
Appendix 3: Some sample Concepts and Heuristics, as coded in LISP
Give the actual LISP versions, and comment them tob make them intelligible.
Appendix 4: Some traces of AM in action
Again, give the actual detailed trace, showing variables altered, etc.
Supplement this by a running commentary.
Appendix 5: External Pointers
Bibliography: annotated; worth of each work; amount of use it was to me.
Documentation: where the system lives, how to run it, related files/programs, etc.
Acknowledgements
OUTLINE FOR DISSERTATION: Level 3
Title page
Abstract
<200 words long. Should be one of the last things written>
A 1-sentence summary of the thesis
"Creative scientific discovery (at least in elementary mathematics)
can be successfully represented as a heuristic search process"
A 1-paragraph summary of the thesis
Dedication and opening quotation
Table of Contents
Chapter 1: Overview
a) 1-page summary of the project
<first pass: condense the 2-page blurb handed around earlier>
b) Motivation: why is this worthwhile?
<should this really be separate from the rest of the intro?
It is "meta" to the project, BUT so are most of the conclusions...>
Inherent interest of getting a handle on the task (sci. creativity)
Personal belief that discovery can be (ought to be) demystified
Potential for learning, from the system, more about the process
of sci. concept formation, thy. formation, chance discovery
(do experiments on the implementations: eg, vary AM's heurs)
Potential usefulness of the implementations themselves (including AM)
Aids to research; i.e., ultimately: new discoveries.
Potential to education: like Mycin, extract heurs. and teach them
All the usual bad reasons:
"Look ma, no hands" + maternal drives + ego + thesis drives +...
Historical:
Need task with no specific goal, to test BEINGs ideas.
Disenchantment with theorem-provers that plod along, in contrast
to the processes which my model of math demands: intu, need,
aesth., multiple reprs, proposing vs proving, fixed task.
c) A 10-page summary of the entire project
<first pass: condense the 20-page text of the 1-hour "whirlwind tour" talk>
<potential organization: mirror the overall organization of the thesis itself>
d) Guide to reading the remainder of the thesis
i) Overall organization of the thesis
ii) Plans for what to read (and in what order), depending on your interests
Plan for those interested in the AI ideas
Plan for those interested in the systems ideas
Plan for those interested in mathematics
iii) Pre-requisites and how to satisfy them, for each chapter
For those with little pure mathematics in their background
For those with little computer science background
For computer scientists with little contact to AI before
<either organized by "type" of reader, or by chapter/section>
Chapter 2: An example
a) Open with a good, 2-page example, a sample excerpt of a session with AM
Tentative choice: find examples of =, need to generalize =, discover "numbers"
b) Explain the example in more and more detail,
but always in English/Math terms; even though it will be necessary to
delve into the system's reasoning deeper and deeper, that too should
always remain informal, in English/Math. Don't worry about representation,
mechanism to pass control, etc.
e.g., we may say "when so few exs are found, it is natural to try to generalize
the predicate in question", without describing the precise way that heuristic is
stored, or how it accomplishes its action-part, or how it gets considered in the
first place. All THOSE things will be dealt with in chapter 5 (detailed examples)
e.g., we'll say "one way to fill in examples of a predicate is to find some
exs of its domain, then randomaly pick as many as are required, then run the
predicate and see if the result is T or F". We may want to go into detail about
finding out the domain of the predicate, getting examples of the domain, etc.,
but probably not at the level of GETP(EQUALITY DOMAIN) or GETP(SETS EXAMPLES).
Chapter 3: Overall Performance
a) The hard problem of judging the performance of AM-like system.
b) What was the starting knowledge, the concepts/heuristics AM initially had?
Essentially in English, describe the given knowledge, concept by concept.
c) What did AM do? What new concepts were created, facets filled in, conjectures...
Describe a typical run, including all the losers considerd, all the time spent
studying known/just-discovered concepts, etc. Go a little into AM's thread of
reasoning/motivation which carried it along that chain of activities.
d) Describe and evaluate the human engineering features (and humans' reactions)
The role ofthe user.
e) Explicit discussion of AM's capabilities and limits
What are some notable omissions in AM's behavior? Can the user elicit these?
What concepts can be elicited from AM now? Withing a little tuning/tiny additions?
What could proabably be done within a couple months of modifications?
Aside from a total change of domain, what kinds of activities does AM lack
(e.g., proof capabilitites), what concepts and discoveries are beyyond its design
limitations.
Chapter 4: Design of AM
<a unified, detailed description of the representations, control strucs,
data strucs, job list, rippling, etc.>
a) How shall a concept be represented internally?
Every concept consists merely of a bundle of facets.
Describe the idea of a fixed universal set of facets.
Illustrate the particular set of facets with a typical concept (composition)
Some particular points not to omit:
The interestingness facet; the worth facet.
The linked nature of the generalizations/specializations facets.
The linked nature of the examples/up-isa facets.
Equations which show how to efficiently locate desirable things:
EXS(z) = ((z . SPEC*) . EXS) . SPEC*
b) What about the heuristics?
Elicited in the context of a specific concept, so keep them tacked onto
that concept, as if they were normal facets.
Assumes a linearity which may not really be there! Alternative is to
maintain 2ān bundles of heuristics, and this is untenable.
Represented as situation-action rules. Describe left/right sides; exs.
c) Control structure
The idea of a job-list or agenda.
A job.
Task, priority, list of symbolic reasons.
Proposed by heuristics (give examples).
The priority-value scheme.
Gradual decay of priorities; when boosted; use of reason list.
Computation of global priority from local reasons.
The actual control structure: pick top job, execute, repeat.
Picking the best heurs/ relevant heuristics to use to execute job:
ripple away from the concept, gathering heuristics as you go
If a job totally fails, it can be reinstated on the agenda,
though typically with a much lower priority value.
Activation energy: distribute time/ list-cells resources.
3 kinds of activities (side-affects of executing a job):
new jobs get suggested and added to the agenda
facets of some concept(s) get filled in
brand new concepts get incarnated (and a couple facets filled in)
Chapter 5: A few more examples
<This chapter assumes that the reader has read through the detailed Design chapter>
<Go over the examples in varying levels of detail. Occasionally, give a
"snapshot" of the new jobs, concepts, facet entries, etc.>
Reconsider the example of discovering cardinality
Initial behavior of the system
Proposing unique factorization
Noticing a "real" number theory conjecture
Chapter 6: Theoretical Issues in Designing a Math Theorizer
<Theoretical issues, constraints. Model of math research.
This is "one way things could be done...". It is neither the sole
"correct" solution, nor is every nuance captured in the AM system itself.>
a) Research in various domains of sci., and of math, proceeds slightly differently
Some examples of this...
b) If we want a system to work in many domains, we'd have to sacrifice some power.
c) This brings up the choice of domain.
Why math? Why elementary set theory, instead of a more sophisticated field?
What else could it be/ not be? Why not a simpler filed like logic?
c) Detailed model of math research
The "raw materials" for AM's power may derive from the fact that it was based
on a detailed (and at least plausible) model for math research,
based on writings by Polya, Poincare, Hadamard, etc.
What are the peculiarities, the details of MATH research, that provide AM's
power? Why would it be weaker if not specific to math?
d) Implications for an automated mathematician
What constraints, design features, are required/suggested by the
domain-specific features of the model of math research?
(All the rest must therefore be random implementation hacks.)
Chapter 7: A Model for Creative Discovery in Science
<Similar to above, but generally applicable to any simulatable empirical science.
View AM as providing a suggestive model for how to emulate a researcher.
This model is of course a (modified) heuristic search paradigm.
In other words, now consider what constraints/suggestions are made simply by
deciding that we want a system which will do scientific theory formation.
This will isolate them from tose that were really specific to mathematics.
While the following material may all be included, it's order may be changed.
a) Scientific discovery as heuristic search: evolution of the model
Each node in the space corresponds to a concept
Concepts can be static (Sets) or Active (Composition)
A relnship. (e.g., a theorem) is itself a concept
An argument (e.g., a proof) is also considered a concept
The "legal" moves (ways to expand the tree of nodes, to grow new ones)
are too numerous to be seriously considered.
The real operators are themselves heuristic rules of thumb
So the "space" itself weeds out all nodes except those proposed for some
good heuristic reason.
As simple numerical calculations show, this space is still enormous.
Using the big-switch idea, we can restrict our attention to 1 domain
(e.g., math) and only use ITS nodes and heur. operators.
Even so, space is still too big
Refine the big-switch idea:
When worrying about nodes N1...,Nk, only consider heuristic
operators known to be rele. to those concepts.
Even so, the space is still too big
We recurse: we use heuristics to reduce our search. These new meta-heuristics
(strategies) guide our attention (which nodes to look at next,
which operators are most promising to apply to each selected node).
If these are good enuf, then we are through (else consider meta-meta-heurs.)
b) The model finally produced
Recapping, we state explicitly how we decide what to do next at any moment.
Indicate the data structures, the flavors of heuristics, control flow, etc.
Note that "apply heur. operator F and add corresponding
new node N" is considered primitive for the moment.
c) Evidence in favor of (empirical validation of) that simple model
i) A prediction: Character of interdisciplinary research
For a novice in some field to do new research, he must learn the rele.
already-known concepts, and (probably) must learn the rele. heurs.
(e.g.: bubble-chamber physics exeriments, molec. genetics)
On the other hand, if he has expertise in another field, he brings with
him many new heurs. to apply (hopefully a couple carry over and were
never applied before in this new field), plus he brings with him the
knowledge of a new net of concepts, from which he may draw analogies
Because of this, interdisciplinary research can be very productive
especially if you're the first such link (e.g.: Suppes)
ii) Turn the model upside-down: Analyzing a given discovery
When we hear a new discovery, we try to perceive a path backwards from
it, connecting to concepts we already know. The easier this is,
the less mystified we are by the discovery.
Consequence: Discoveries in alien field seem magical and v. hard
Consequence: Let-down after seeing how a magic trick is performed
Consequence: Let-down after seeing how an AI program really works.
Reasons why going backwards is probably easier than going forwards
Easier since you have a given starting point (the given discovery)
The alternative is to find the "right" set of known concepts
which will eventually lead to some interesting new concept.
Easier since the target space is huge (get to any known concept,
vs get to any very interesting concept)
Since nature is unkind, and very few avenues lead to
interesting new discoveries, valuable new concepts.
On the other hand, in working backwards with our heurs,
we worry about branching also, and must be able to quickly
tell if we are really simplifying the situation!
(i.e., Even if the model is right, we must herein worry
about the branching factor when going in reverse)
Easier if you are shown the discovery step by step
Since then you will know the necessary intermediate concepts
Since then each step will seem "easy" and obvious
Consequence: Reading journal article, feel "I could have done that"
Consequence: Work 3 years, make discovery, kick yourself for not
having seen it earlier, since it was so obvious.
Consequence: given a math or physics problem, you're more impressed
with the solution if you spend (waste?) a few hours trying
to solve it, than if you just read thru the soln. at once.
iii) "Failure" is due to missing some "right" heurs/concepts, or the wisdom
(i.e., strategies, meta-heuristics, etc.) to use them effectively.
Consequence: Teaching by example forces students to induce the
(meta-)heuristics themselves, often imperfectly.
E.g.: many GP's failed the antibiotic drug prescription test
Consequence: One can -- and should -- teach the strategies directly
<is this verified anywhere? counter-indicated? Polya? >
iv) Momentous occasion in science is typically due to discovery of a brand
new heuristic, or else due to creation of a new concept unconnected
to the existing concepts by some plausible operator.
Occasionally, all one finds is a daring interdisciplinary analogy.
Occasionally, just the first to follow a perfectly plaus. path.
Examples:
Non-Euclidean geometries ("Counter-intuitive systems may
still be consistent and interesting")
Relativity ("Counter-intu. sys. may even have physical
reality; simultaneity may be a local superstition)
Knuth's (Conway's) Surreal Numbers (no contacts, not
easy to see how/why their defn. was ever considered int.)
Schroedengier's wave equation (plucked out of thin air)
Methods of complex analysis (orig. based only on analogies)
Mendelev's periodic chart (first one to write down the
recently-discovered data about atomic wts. systematically)
< ... more examples ? ... >
v) Presence of Zeitgeist in Science: often, a discovery is made simultaneously
Because many researchers simult. hear some fresh concepts, and apply
the same bag of tricks.
Example: calculus (Newton, Leibnitz), ...
d) Questioning that model
i) Misleading character of polished results
It would seem, from reading texts and journals, that science itself
is more a flowing, smooth development, than a backtracking search.
This is illusory, as we scientists know. (we fend off this attack!)
ii) Omissions:
Serendipidy, incubation, unconscious, wealth of analogic materials
(mass of introspections from great scientists are mystical)
Error: accidental good fortune (Franklin using string, not wire;
Lederberg picking one of the few bacteria that really do
reproduce sexually; Galois forced to cram a lifetime of
creativity into one pre-duel burst of writing).
Inability to explain why "long-shot" investigations were undertaken.
(eg: superconductivity discovered as a Master's project)
Zeitgeist: popular trends/fads in science at the time;
Cultural and political themes popular at the time.
In CS: Heirarchy: Prussian army, bureaucracy
Cooperating modular experts
Tremendous dependence on existing hardware
(also true of numerical analysis)
Social taboos against experimenting with human subjects
Difficulty of generating new operators and unconnected nodes
Defense: this really IS rare, so again the model is OK
Rebut: But it DOES happen every now and then!! How?!
Focus of attention
People inherently are biassed in favor of recently-considered
nodes; they can't flit back and forth from one leaf to another.
This can be simulated in a heuristic search (by artificially
weighting the concepts and heuristics based on recency of
use), but it is not an inherent part of the model.
iii) Getting down to earth: limitations of the model
What kinds of creative activities is this a bad model for?
Brainstorming (intentional lack of plausibility)
Problem-solving (very specific goal)
Situations where there are very few heuristics (prop. calc)
Situations where the major difficulty is applying a heuristic
once it's chosen (e.g., soft fields like sociology)
Absurdly robust or delicate chains of discoveries
A delicate chain is really like problem solving
(there exists only 1 interesting concept in this
part of the space)
A very robust section of space needn't worry about
using heuristics (if every move produces something
very interesting)
Exactly what situations does it honestly capture?
Essentially undirected research in a very hard science,
with only secondary kinds of discoveries anticipated.
Many heuristic operators exist for any given node,
and a few meta-heuristics exist for the domain.
Where the percentage of int. nodes "out there" is low
(under 10%) but not negligible (exists only one!)
Pragmatic limitations
Most seriously, it treats "add new node N" as primitive.
In real life, people spend much time "adding a node".
They have to answer many questions about it, play with it,
and try to relate it to other known concepts. In this way,
the worth of the node is estimated, and new empirical data
is gathered which may trigger some heurs. to suggest the
next drection to take, the next concept/relnship to explore.
e) Fixing up the model
i) The most serious pragmatic limitation is this business about how much
work must be spent to fill in any new concept. Fix this up by
assuming that each concept has facets, not all of which need be
filled in at its conception. Then some heuristics (meta-heurs) can
be concerned with tasks at the level of filling in a facet
(deciding which facet of which concept to work on next). The basic
control structure can in fact be oriented around filling in facets.
If desired, the creation of new nodes can be a side effect!
Give diagrams illustrating all this.
Chapter 8: Concluding Discussion
a) What gives AM it's power? Identifying the crucial ideas.
b) Ultimately, what kinds of things could AM-like systems (never) do?
The need to do experiments means that the cost of a typical expt. must
be low; even better, the machine should be able to simulate and get answer.
c) Experiments on AM one can perform
d) Uses for AM-like systems: synergetic cooperation
e) Implications for math education.
f) Directions for Future Research
Appendices
Appendix 1:
The Theory of Maximally-Divisible Numbers
Appendix 2:
History of the "BEINGs" representation scheme
Appendix 3:
Some sample Concepts and Heuristics, as coded in LISP
Appendix 4:
Some traces of AM in action
Appendix 5:
Bibliography, Documentation, Acknowledgements